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1  Abstract 


We  have  made  progress  in  several  areas  this  quarter: 
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•  This  quarter  saw  considerable  progress  in  both  high  and  low  level  software  aspects  of 
the  project. 

•  We  have  begun  work  on  a  CNS-0  system,  to  be  based  on  the  nearly  completed  TO 
processor. 

•  We  have  work  continued  to  make  progress  in  the  application  of  analog  VLSI  to  speech 
pre-processors. 

The  project  continues  to  have  a  significant  effect  on  the  education  of  graduate  and 
undergraduate  students  at  our  institution.  There  are  currently  16  Ph.D.,  1  M.S.,  and  2 
B.S.  students  associated  with  the  project  (some  are  paid  through  supporting  agencies  other 
than  the  ONR). 


Technical  Status 
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2.1  Software  and  Applications 

High-level  software.  Major  progress  was  achieved  in  the  high-level  software  section  of 
the  project.  The  Beta  release  of  Sather  1.0  has  been  successfully  ported  to  a  number 
of  platforms  and  acceptance  is  very  good.  The  parallel  version,  pSather,  has  also  made 
excellent  progress;  a  machine  independent  run-time  interface  has  been  specified  and  is  being 
implemented.  The  new  implementation  also  provides  support  for  the  ambitious  monitoring 
and  debugging  system  developed  by  Mark  Minas  [minas]. 

An  important  aspect  of  the  software  part  of  the  project  is  the  detailed  analysis  of  the 
CNS  architecture  for  a  variety  of  problems  of  interest.  In  the  last  quarter  we  began  the 
study  of  how  the  Torrent  Architecture  and  CNS  can  be  applied  to  image  understanding. 
The  results  were  very  encouraging  and  we  will  extend  these  studies  under  funding  from 
B.  Yoon  of  Arpa.  We  have  also  advanced  our  studies  of  neural  net  architectures  on  CNS. 
Ben  Gomes  has  completed  the  first  draft  of  his  thesis  proposal,  which  covers  the  parallel 
implementation  of  neural  networks. 
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Low-level  software.  The  work  in  system  software  for  CNS  has  concentrated  on  pro¬ 
ducing  an  environment  for  application  and  library  development.  The  TO  kernel  software 
and  workstation  server  program  have  both  been  refined  and  now  provide  a  comprehen¬ 
sive  environment  for  running  users  application  programs — this  includes  full  IEEE  floating 
point  emulation,  a  comprehensive  range  of  system  calls  and  hooks  for  debugging  and  pro¬ 
cess  monitoring.  The  debugging  hooks  have  allowed  the  gdb  debugger  to  be  ported  to  the 
SPERT  system,  and  additions  have  been  made  to  support  debugging  of  vector  code.  To 
complement  the  floating  point  instruction  emulation  provided  by  the  kernel — which  allows 
arbitrary  MIPS  architecture  floating  point  code  to  run — a  library  of  IEEE  single  precision 
vector  floating  point  operations  is  currently  undergoing  final  testing.  These  will  give  reason¬ 
able  high  performance  («15  Mflops)  for  vectorized  code  and  will  be  a  useful  stepping  stone 
when  converting  applications  from  scalar  floating  point  to  vector  fixed  point.  As  in  previous 
quarters,  there  have  been  the  usual  background  tasks  of  fixed  point  library  development 
and  test  environment  support  for  the  VLSI  work. 


Speech  application.  We  have  started  working  on  speech  search  algorithms  that  are  more 
vectorizable  than  the  usual  frame-synchronous  Viter bi  beam  search.  A  number  of  these 
are  variations  of  the  priority  queue  based  approach  that  is  commonly  given  the  misnomer 
of  “stack  decoding”.  The  one  we  have  been  working  on  most  recently  is  an  application 
of  simulated  annealing  to  the  search.  This  should  be  both  very  vectorizable  and  very 
parallelizable.  We  also  have  work  in  progress  on  N-best  approaches,  which  will  also  be 
important  for  large  vocabulary  recognition  implementations  on  parallel  machines. 

2.2  Hardware  Development 

CNS  Systems.  This  quarter  we  began  detailed  design  work  on  a  machine  we  are  calling 
CNS-0.  As  with  the  original  CNS-1  design,  this  machine  is  based  on  the  Torrent  processor 
and  is  structure  in  a  barrel  topology.  However,  this  machine  will  employ  the  soon  to  be 
completed  TO  processor  [asanovic].  Unlike  the  T1  design,  this  processor  does  not  have  the 
network  interface  and  router  integrated  on  the  chip.  Rather,  the  CNS-0  system  network  will 
be  built  using  commercially  available  field  programmable  logic  components.  Because  this 
system  will  be  constructed  using  available  components,  it  will  be  completed  sooner  than  if 
we  where  to  wait  for  another  chip  design  cycle.  The  CNS-0  construction  will  allow  earlier 
development  of  system  and  application  software  than  would  have  been  possible  otherwise.  ~por 
This  software  will  carry  over  to  the  CNS-1  system  and  later  CNS  systems.  Work  this  quarter'" — 
has  focused  in  detailed  network  interface  and  router  design  using  FPGA  components. 

In  parallel  with  this  effort  we  have  continued  the  detail  design  of  the  CNS-1  machine. lcec* 
Tim  Callahan’s  recently  completed  Master’s  thesis  reports  on  the  details  of  the  processor- 
network  interface  [callahanj. 


Rambus/communication  test  chip.  Testing  of  the  third  version  of  the  communication  ~T: - 

test  chip  was  successful  at  210  MHz.  This  wraps  up  an  important  design  area  in  low  voltage3  [  Codes 
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swing  signalling  methods. 

A  new  test  chip,  using  full  voltage  swing  signalling  and  a  simplified  frequency  locking 
technique  was  designed  and  is  being  fabricated  at  MOSIS  now.  A  new  circuit  board  is  also 
in  fabrication,  and  should  allow  testing  the  chips  during  the  next  quarter.  The  full  voltage 
swing  signalling  method  provides  several  advantages  and  disadvantages  for  multiprocessor 
systems.  The  primary  reason  to  design  and  fabricate  the  new  chips  is  to  see  whether  the 
disadvantages  will  be  important  in  a  working  system. 


SPERT  board  &  TO  Chip  issues.  Detailed  simulations  of  the  interface  between  TO 
and  the  memory  on  the  SPERT  board  have  been  completed.  Although  the  memory  subsys¬ 
tem  was  designed  early  on,  several  timing  parameters  could  only  be  estimated  due  to  the 
incomplete  TO  chip  design.  With  TO  nearing  tapeout,  final  adjustments  are  being  made  to 
the  clock  controller  design  and  the  I/O  interface  on  the  SPERT  board. 


2.3  Analog  VLSI  pre-processors 

The  analog  auditory  pre-processor  effort  has  several  developments  to  report  this  quarter. 
As  previously  reported,  we  have  been  evaluating  our  auditory  pre-processor  chip  in  a  speech 
recognition  task,  using  a  commercial  speech  recognition  toolkit  (HTK)  as  a  recognizer;  this 
quarter  the  evaluation  reached  its  conclusion.  For  a  200-speaker,  telephone-quality,  isolated 
digits  database,  we  found  errors  rates  about  3  to  4  times  as  high  as  traditional  front-ends 
(about  97%  correct  for  traditional  front  ends,  about  89%  correct  for  our  chip). 

In  examining  the  errors  made  by  our  chip,  we  found  that  most  errors  occurred  in  the 
confusions  of  starting  consonants  in  a  word — for  example  “five”  and  “nine”  were  most 
often  confused.  A  visual  inspection  of  chip  output  confirmed  the  paucity  of  information  for 
broadband,  brief  consonants  like  “f” — this  is  to  be  expected  in  a  periodicity-based  spectral 
representation.  However,  the  same  chip,  loaded  with  different  parameters,  can  compute 
other  representations  that  serve  as  excellent  detectors  for  transient  events.  Clearly,  the  use 
of  multiple  chips  tuned  to  different  representations  would  lead  to  better  performance — a 
view  shared  by  many  auditory  scene  analysis  researchers. 

To  facilitate  building  systems  with  several  auditory  pre- processors,  we  developed  an  ex¬ 
tension  to  the  address-event  communications  protocol  we  use  in  our  auditory  pre-processors. 
The  extension  permits  many  address-event  communications  ports  to  share  a  common  bus, 
without  needing  additional  chips  for  bus  management.  This  quarter,  we  evaluated  a  test 
chip  for  this  protocol  extension;  the  scheme  worked  as  expected. 

We  are  now  designing  a  new  version  of  our  auditory  pre- processor,  adding  this  address- 
event  extension;  tapeout  is  planned  for  August  31.  This  chip  will  allow  us  to  evaluate  the 
use  of  several  different  auditory  representations  simultaneously  in  our  speech  recognition 
application. 
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